Skip to content

add nemo_bridge#1050

Open
chai-xiaonan wants to merge 8 commits intoflagos-ai:mainfrom
chai-xiaonan:add_nemo_bridge
Open

add nemo_bridge#1050
chai-xiaonan wants to merge 8 commits intoflagos-ai:mainfrom
chai-xiaonan:add_nemo_bridge

Conversation

@chai-xiaonan
Copy link

Reconstruct the Nemo-Bridge based on the restructured flagscale version. Currently, flagscale has supported some functions of nemo-bridge, enabling the flagscale framework to load and save ckpt in the hf format during the training process. Additionally, in the current version, new features have been added, allowing for the setting of the number of iterations for saving hf weights based on the save_hf_interval. The model has verified that Deepseek V3 16_a3B, Qwen3-32B, and Qwen3-0.6B all have correct accuracy.

#Load the HF model from config
config_load = args.hf_config_path
config = safe_load_config_with_retry(config_load, trust_remote_code=False)
bridge = AutoBridge.from_hf_config(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this save-ckpt step allocate extra GPU memory when initializing an HF model?

bridge.load_hf_weights(ddp_model)
# no optimizer weight
iteration=0
num_floating_point_operations_so_far=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add print_rank_0 here

# use megatron bridge
from megatron.nemo_bridge.models import AutoBridge
bridge=AutoBridge.from_hf_pretrained(load_dir)
bridge.load_hf_weights(ddp_model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can nemo-bridge’s load_hf_model handle a ddp_model directly, where ddp_model is wrapped by DistributedDataParallel?

@@ -0,0 +1,8 @@
# Copyright (c) 2025, BAAI. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemo megatron-bridge supports pip install for usage, ref https://pypi.org/project/megatron-bridge/
please remove source codes

@@ -0,0 +1,8 @@
# Copyright (c) 2025, BAAI. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename flagscale/train/megatron/nemo_bridge to flagscale/train/megatron/bridge so that it matches the import pattern from megatron.bridge

Copy link
Contributor

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When copy pasting source code from other repos, we are supposed/obliged to copy paste their copyright notice as well. We cannot claim copyrights for these code.
The original code has following copyright header to be preserved:

# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

@@ -0,0 +1,110 @@
# Copyright (c) 2025, BAAI. All rights reserved.
#
# Copied from: https://github.com/NVIDIA-NeMo/Megatron-Bridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Megatron-Bridge has a copyright claim, we are supposed to paste their copyright statements here.


if not has_implementation:
raise ValueError(
f"\n�~\~W Model architecture '{architecture}' is not yet supported\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these weird characters?
There are some other similar cases in this string.

@@ -0,0 +1,359 @@
# Copyright (c) 2025, BAAI. All rights reserved.
#
# Mainly adapted from: https://github.com/NVIDIA-NeMo/Megatron-Bridge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify what has been "borrowed".
Please also paste the original copyright claim here if the code was not originally written by us.

@@ -0,0 +1,202 @@
# Copyright (c) 2025, BAAI. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me that this file was largely adapted from flagscale/train/megatron/nemo_bridge/models/conversion/auto_bridge.py. We copy-pasted the source and we are claiming copyright for this code. This is not acceptable.

We can borrow code from other projects, provided that the license terms grant us this right. In that case, we still have to pay credit to the original authors. We are obliged to mention their copyrights.

There are some weird characters in this file which was obviously a character conversion problem during copy/paste. Please fix them as well.

@chai-xiaonan
Copy link
Author

chai-xiaonan commented Feb 5, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants